步态描绘了个人独特而区别的步行模式,并已成为人类识别最有希望的生物识别特征之一。作为一项精细的识别任务,步态识别很容易受到许多因素的影响,并且通常需要大量完全注释的数据,这些数据是昂贵且无法满足的。本文提出了一个大规模的自我监督基准,以通过对比度学习进行步态识别,旨在通过提供信息丰富的步行先验和各种现实世界中的多样化的变化,从大型的无标记的步行视频中学习一般步态代表。具体而言,我们收集了一个由1.02m步行序列组成的大规模的无标记的步态数据集gaitu-1m,并提出了一个概念上简单而经验上强大的基线模型步态。在实验上,我们在四个广泛使用的步态基准(Casia-B,Ou-Mvlp,Grew and Grew and Gait3d)上评估了预训练的模型,或者在不转移学习的情况下。无监督的结果与基于早期模型和基于GEI的早期方法相当甚至更好。在转移学习后,我们的方法在大多数情况下都超过现有方法。从理论上讲,我们讨论了步态特异性对比框架的关键问题,并提供了一些进一步研究的见解。据我们所知,Gaitlu-1M是第一个大规模未标记的步态数据集,而GaitSSB是第一种在上述基准测试基准上取得显着无监督结果的方法。 GaitSSB的源代码将集成到OpenGait中,可在https://github.com/shiqiyu/opengait上获得。
translated by 谷歌翻译
步态识别旨在通过相机来识别一个距离的人。随着深度学习的出现,步态识别的重大进步通过使用深度学习技术在许多情况下取得了鼓舞人心的成功。然而,对视频监视的越来越多的需求引入了更多的挑战,包括在各种方差下进行良好的识别,步态序列中的运动信息建模,由于协议方差,生物量标准安全性和预防隐私而引起的不公平性能比较。本文对步态识别的深度学习进行了全面的调查。我们首先介绍了从传统算法到深层模型的步态识别的奥德赛,从而提供了对步态识别系统的整个工作流程的明确知识。然后,从深度表示和建筑的角度讨论了步态识别的深入学习,并深入摘要。具体而言,深层步态表示分为静态和动态特征,而深度体系结构包括单流和多流架构。遵循我们提出的新颖性分类法,它可能有益于提供灵感并促进对步态认识的感知。此外,我们还提供了所有基于视觉的步态数据集和性能分析的全面摘要。最后,本文讨论了一些潜在潜在前景的开放问题。
translated by 谷歌翻译
灵巧的操纵仍然是机器人技术中的一个空缺问题。为了协调研究界为解决这个问题的努力,我们提出了共同的基准。我们设计和构建了机器人平台,该平台托管在MPI上供智能系统托管,可以远程访问。每个平台由三个能够敏捷物体操纵的机器人手指组成。用户能够通过提交自动执行的代码(类似于计算群集)来远程控制平台。使用此设置,i)我们举办机器人竞赛,来自世界任何地方的团队访问我们的平台以应对具有挑战性的任务ii)我们发布了在这些比赛中收集的数据集(包括数百个机器人小时),而我们为研究人员提供了访问自己项目的这些平台。
translated by 谷歌翻译
专家混合物(MOE)由于其成功提高了模型质量,特别是在变压器方面的成功而变得流行。通过向几个专家提供稀疏门的令牌,每个专家只包含完整模型的一部分,Moe将模型尺寸保持不变,并且显着降低了每次标记计算,从而有效地缩放神经网络。但是,我们发现,目前的联合训练专家和稀疏门的方法引入了对模型精度的负面影响,缩短了昂贵的大规模模型训练的效率。在这项工作中,我们提出了用于MOE训练的密集至稀疏的门(DTS-Gate)。具体而言,代替使用永久稀疏门,DTS-Gate开始作为向所有专家路由令牌的密集栅极开始,然后逐渐和自适应地成为稀疏,而路线较少到更少的专家。与DTS-Gate的Moe自然地通过培训所有专家训练专家和稀疏门的训练,然后学习稀疏门。实验表明,与GPT-MOE(1.5B)模型中的最先进的开关门相比,使用OpenWeBtext数据集(40GB),DTS-Gate可以获得2.0倍的加速以达到相同的验证困惑,如以及更高的拖鞋 - 效率为1.42倍的加速。
translated by 谷歌翻译
用于将音频信号的光谱表示转换为波形的神经声学器是语音合成管道中的常用组件。它侧重于合成来自低维表示的波形,例如MEL-谱图。近年来,已经引入了不同的方法来开发这种声音。但是,评估这些新的声音仪并将其表达与以前的声学相比,它变得更具挑战性。为了解决这个问题,我们呈现VOCBENCH,这是一个框架,该框架是基于最先进的神经声码器的性能。 VOCBENCH使用系统研究来评估共享环境中的不同神经探测器,使它们能够进行公平比较。在我们的实验中,我们对所有神经副探测器的数据集,培训管道和评估指标使用相同的设置。我们执行主观和客观评估,以比较每个声码器沿不同的轴的性能。我们的结果表明,该框架能够为每种声学器提供竞争的疗效和合成样品的质量。 Vocebench框架可在https://github.com/facebookResearch/Vocoder-Benchmark中获得。
translated by 谷歌翻译
持续咳嗽是呼吸系统疾病的主要症状。通过可穿戴物品来检测咳嗽,特别是在Covid-19大流行期间,已经支付了越来越多的研究。在所有类型的传感器中,麦克风最广泛地用于检测咳嗽。然而,处理音频信号所需的强力消耗阻碍了对电池限制的商业可穿戴产品(例如耳塞)的连续音频咳嗽检测。我们呈现了利用较低功率传感器,惯性测量单元(IMU)的COUGHTRIGGER作为咳嗽检测激活器,以触发更高功率的传感器,用于音频处理和分类。它能够以最小的电池消耗运行作为备用服务,并在从IMU检测到候选咳嗽时触发基于音频的咳嗽检测。此外,IMU的使用带来了改善咳嗽检测特异性的益处。实验是对45个科目进行的,我们的IMU的模型达到了0.77 AUC评分,留出了一个主题的评价。我们还验证了其对自由生活数据的有效性,并通过设备实现。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility.
translated by 谷歌翻译